340 research outputs found
Constrained ordination analysis with enrichment of bell-shaped response functions
Constrained ordination methods aims at finding an environmental gradient along which the species abundances are maximally separated. The species response functions, which describe the expected abundance as a function of the environmental score, are according to the ecological fundamental niche theory only meaningful if they are bell-shaped. Many classical model-based ordination methods, however, use quadratic regression models without imposing the bell-shape and thus allowing for meaningless U-shaped response functions. The analysis output (e.g. a biplot) may therefore be potentially misleading and the conclusions are prone to errors. In this paper we present a log-likelihood ratio criterion with a penalisation term to enforce more bell-shaped response shapes. We report the results of a simulation study and apply our method to metagenomics data from microbial ecology
Tests of fit for the logarithmic distribution
Smooth tests for the logarithmic distribution are compared with three tests: the first is a test due to Epps and is based on a probability generating function, the second is the Anderson-Darling test, and the third is due to Klar and is based on the empirical integrated distribution function. These tests all have substantially better power than the traditional Pearson-Fisher X2 test of fit for the logarithmic. These traditional chi-squared tests are the only logarithmic tests of fit commonly applied by ecologists and other scientists
SPsimSeq : semi-parametric simulation of bulk and single-cell RNA-sequencing data
SPsimSeq is a semi-parametric simulation method to generate bulk and single-cell RNA-sequencing data. It is designed to simulate gene expression data with maximal retention of the characteristics of real data. It is reasonably flexible to accommodate a wide range of experimental scenarios, including different sample sizes, biological signals (differential expression) and confounding batch effects
On the utility of RNA sample pooling to optimize cost and statistical power in RNA sequencing experiments
Background: In gene expression studies, RNA sample pooling is sometimes considered because of budget constraints or lack of sufficient input material. Using microarray technology, RNA sample pooling strategies have been reported to optimize both the cost of data generation as well as the statistical power for differential gene expression (DGE) analysis. For RNA sequencing, with its different quantitative output in terms of counts and tunable dynamic range, the adequacy and empirical validation of RNA sample pooling strategies have not yet been evaluated. In this study, we comprehensively assessed the utility of pooling strategies in RNA-seq experiments using empirical and simulated RNA-seq datasets.
Result: The data generating model in pooled experiments is defined mathematically to evaluate the mean and variability of gene expression estimates. The model is further used to examine the trade-off between the statistical power of testing for DGE and the data generating costs. Empirical assessment of pooling strategies is done through analysis of RNA-seq datasets under various pooling and non-pooling experimental settings. Simulation study is also used to rank experimental scenarios with respect to the rate of false and true discoveries in DGE analysis. The results demonstrate that pooling strategies in RNA-seq studies can be both cost-effective and powerful when the number of pools, pool size and sequencing depth are optimally defined.
Conclusion: For high within-group gene expression variability, small RNA sample pools are effective to reduce the variability and compensate for the loss of the number of replicates. Unlike the typical cost-saving strategies, such as reducing sequencing depth or number of RNA samples (replicates), an adequate pooling strategy is effective in maintaining the power of testing DGE for genes with low to medium abundance levels, along with a substantial reduction of the total cost of the experiment. In general, pooling RNA samples or pooling RNA samples in conjunction with moderate reduction of the sequencing depth can be good options to optimize the cost and maintain the power
A method to search for optimal field allocations of transgenic maize in the context of co-existence
Spatially isolating genetically modified (GM) maize fields from non-GM maize
fields is a robust on-farm measure to keep the adventitious presence of GM
material in the harvest of neighboring fields due to cross-fertilizations
below the European labeling threshold of 0.9%. However, the
implementation of mandatory and rigid isolation perimeters can affect the
farmers' freedom of choice to grow GM maize on their fields if neighboring
farmers do not concur with their respective cropping intentions and crop
plans. To minimize the presence of non-GM maize within isolation perimeters
implemented around GM maize fields, a method was developed for optimally
allocating GM maize to a particular set of fields. Using a Geographic
Information System dataset and Monte Carlo analyses, three scenarios were
tested in a maize cultivation area with a low maize share in Flanders
(Belgium). It was assumed that some farmers would act in collaboration by
sharing the allocation of all their arable land for the cultivation of GM
maize. From the large number of possible allocations of GM maize to any
field of the shared pool of arable land, the best field combinations were
selected. Compared to a random allocation of GM maize, the best field
combinations made it possible to reduce spatial co-existence problems, since
at least two times less non-GM maize fields and their corresponding farmers
occurred within the implemented isolation perimeters. In the selected field
sets, the mean field size was always larger than the mean field size of the
common pool of arable land. These preliminary data confirm that the optimal
allocation of GM maize over the landscape might theoretically be a valuable
option to facilitate the implementation of rigid isolation perimeters
imposed by law.
Goodness-of-fit tests based on sample space partitions : a unifying overview
Recently the authors have proposed tests for the one-sample and the Îş-sample problem, and a test for independence. All three tests are based on sample space
partitions, but they were originally developed in different papers. Here we give an
overview of the construction of these tests, stressing the common underlying concept of “sample space partitions.
SPECS: a non-parametric method to identify tissue-specific molecular features for unbalanced sample groups
Background
To understand biology and differences among various tissues or cell types, one typically searches for molecular features that display characteristic abundance patterns. Several specificity metrics have been introduced to identify tissue-specific molecular features, but these either require an equal number of replicates per tissue or they can’t handle replicates at all.
Results
We describe a non-parametric specificity score that is compatible with unequal sample group sizes. To demonstrate its usefulness, the specificity score was calculated on all GTEx samples, detecting known and novel tissue-specific genes. A webtool was developed to browse these results for genes or tissues of interest. An example python implementation of SPECS is available at https://github.com/celineeveraert/SPECS. The precalculated SPECS results on the GTEx data are available through a user-friendly browser at specs.cmgg.be.
Conclusions
SPECS is a non-parametric method that identifies known and novel specific-expressed genes. In addition, SPECS could be adopted for other features and applications
Small sample inference for probabilistic index models
Probabilistic index models may be used to generate classical and new rank tests, with the additional advantage of supplementing them with interpretable effect size measures. The popularity of rank tests for small sample inference makes probabilistic index models also natural candidates for small sample studies. However, at present, inference for such models relies on asymptotic theory that can deliver poor approximations of the sampling distribution if the sample size is rather small. A bias-reduced version of the bootstrap and adjusted jackknife empirical likelihood are explored. It is shown that their application leads to drastic improvements in small sample inference for probabilistic index models, justifying the use of such models for reliable and informative statistical inference in small sample studies
- …